Neural Network-based Language Model for Conversational Telephone Speech Recognition
نویسنده
چکیده
Preface This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. I hereby declare that my thesis does not exceed the limit of length prescribed in the Special Regulations of the M. Phil. examination for which I am a candidate. The length of my thesis is 14980 words. Acknowledgements I would like to thank Professor Phil Woodland for his help and guidance over the course of this project. I would also like to thank David Mrva for providing a customized version of the LPlex tool capable of summing over words in a specified shortlist. Abstract This thesis presents a large scale neural network language model for telephone conversation transcriptions. By mapping n-gram contexts to a continuous vector space, the neural network is trained with softmax normalization to operate as a probability estimator. The smooth nature of the resulting distributions achieves consistently reduced perplexity for restricted subsets of the vocabulary. Excessive training time is a major issue and optimized linear algebra libraries are used for an efficient implementation of feed forward and back propagation during training. A word-class interpretation of the network inputs and outputs is demonstrated to obtain improved perplexity over the n-gram model when training data is limited.
منابع مشابه
Improving English Conversational Telephone Speech Recognition
The goal of this work is to build a state-of-the-art English conversational telephone speech recognition system. We investigated several techniques to improve acoustic modeling, namely speaker-dependent bottleneck features, deep Bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks, data augmentation and score fusion of DNN and BLSTM models. Training set consisted of the 300 ho...
متن کاملThe IBM 2015 English conversational telephone speech recognition system
We describe the latest improvements to the IBM English conversational telephone speech recognition system. Some of the techniques that were found beneficial are: maxout networks with annealed dropout rates; networks with a very large number of outputs trained on 2000 hours of data; joint modeling of partially unfolded recurrent neural networks and convolutional nets by combining the bottleneck ...
متن کاملLexicon-Free Conversational Speech Recognition with Neural Networks
We present an approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decoding procedure. This approach eliminates much of the complex infrastructure of modern speech recognition systems, making it possible to directly train a speech recognizer using errors generated by spoken language understanding ...
متن کاملThe IBM 2016 English Conversational Telephone Speech Recognition System
We describe a collection of acoustic and language modeling techniques that lowered the word error rate of our English conversational telephone LVCSR system to a record 6.6% on the Switchboard subset of the Hub5 2000 evaluation testset. On the acoustic side, we use a score fusion of three strong models: recurrent nets with maxout activations, very deep convolutional nets with 3x3 kernels, and bi...
متن کاملConversational telephone speech recognition
This paper describes the development of a speech recognition system for the processing of telephone conversations, starting with a state-of-the-art broadcast news transcription system. We identify major changes and improvements in acoustic and language modeling, as well as decoding, which are required to achieve state-of-theart performance on conversational speech. Some major changes on the aco...
متن کاملHierarchies of neural networks for connectionist speech recognition
We present a principled framework for context-dependent hierarchical connectionist HMM speech recognition. Based on a divideand-conquer strategy, our approach uses an Agglomerative Clustering algorithm based on Information Divergence (ACID) to automatically design a soft classi er tree for an arbitrary large number of HMM states. Nodes in the classi er tree are instantiated with small estimator...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005